MARVIN: A platform for large-scale analysis of Semantic Web data

نویسندگان

  • Eyal Oren
  • Spyros Kotoulas
  • George Anadiotis
  • Ronny Siebes
  • Annette ten Teije
  • Frank van Harmelen
چکیده

Web Science requires efficient techniques for analysing large datasets. Many Semantic Web problems are difficult to solve through common divide-and-conquer strategies, since they are hard to partition. We present MARVIN, a parallel and distributed platform for processing large amounts of RDF data, on a network of loosely-coupled peers. We present our divide-conquer-swap strategy and show that this model converges towards completeness. We evaluate performance, scalability, load balancing and efficiency of our system. I. ANALYSING WEB DATA Web Science involves, amongst others, the analysis and interpretation of data and phenomena on the Web [9]. Since the datasets involved are typically very large, efficient techniques are needed for scalable execution of analysis jobs over these datasets. Traditionally, scaling computation through a divide-andconquer strategy has been successful in a wide range of data analysis settings. Dedicated techniques have been developed for analysis of Web-scale data through a divide-and-conquer strategy, such as MapReduce [5]. Over the recent years, large volumes of Semantic Web data have become available, to the extent that the data is quickly outgrowing the capacity of storage systems and reasoning engines. Through the “linking open data” initiative, and through crawling and indexing infrastructures [13], datasets with millions or billions of triples are now readily available. These datasets contain RDF triples and many RDFS and OWL statements with implicit semantics [6]. From a Web Science viewpoint, these datasets are often more interesting than the Web graph [9] of page hyperlinks. First, since these datasets contain typed relations with particular meaning, they can be subjected to more detailed analysis. Secondly, most of these datasets are not annotated Web pages but rather interlinked exports of the “deep Web”, which has traditionally been hard to obtain and analyse [14]. However, to process, analyse, and interpret such datasets collected from the Web, infrastructure is needed that can scale to these sizes, and can exploit the semantics in these datasets. In contrast to other analysis tasks concerning Web data, it is not clear how to solve many Semantic Web problems through divide-and-conquer, since it is hard to split the problem into independent partitions. To illustrate this problem we will focus on a common and typical problem: computing the deductive closure of these datasets through logical reasoning. Recent benchmarks [2, 8] show that current RDF stores can barely scale to the current volumes of data, even without this kind of logical reasoning. II. SCALABLE RDF REASONING To deal with massive volumes of Semantic Web data, we aim at building RDF engines that offer massively scalable reasoning. In our opinion, such scalability can be achieved by combining the following approaches: • using parallel hardware which runs distributed algorithms that exploit such hardware regardless of the scale, varying from tens of processors to many hundreds (as in our experiments) or even many thousands. • designing anytime algorithms that produce sound results where the degree of completeness increases over time. Such algorithms can trade the speed with which the inference process converges to completeness against the size of the dataset, while still guaranteeing eventual completeness. • our novel divide-conquer-swap strategy, which extends the traditional approach of divide-and-conquer with an iterative procedure whose result converge towards completeness over time. We have implemented our approach in MARVIN1, a parallel and distributed platform for processing large amounts of RDF data. MARVIN consists of a network of looselycoupled machines using a peer-to-peer model and does not require splitting the problem in independent subparts. MARVIN is based on the approach of divide-conquer-swap: peers autonomously partition the problem in some manner, each operate on some subproblem to find partial solutions, and then re-partition their part and swap it with another peer; all peers keep re-partitioning, solving, and swapping to find all solutions. We show that this model is sound, converges and reaches completeness eventually.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Marvin: Distributed reasoning over large-scale Semantic Web data

Many Semantic Web problems are difficult to solve through common divide-and-conquer strategies, since they are hard to partition. We present Marvin, a parallel and distributed platform for processing large amounts of RDF data, on a network of loosely-coupled peers. We present our divide-conquer-swap strategy and show that this model converges towards completeness. Within this strategy, we addre...

متن کامل

Semantic Constraint and QoS-Aware Large-Scale Web Service Composition

Service-oriented architecture facilitates the running time of interactions by using business integration on the networks. Currently, web services are considered as the best option to provide Internet services. Due to an increasing number of Web users and the complexity of users’ queries, simple and atomic services are not able to meet the needs of users; and to provide complex services, it requ...

متن کامل

Centralized Clustering Method To Increase Accuracy In Ontology Matching Systems

Ontology is the main infrastructure of the Semantic Web which provides facilities for integration, searching and sharing of information on the web. Development of ontologies as the basis of semantic web and their heterogeneities have led to the existence of ontology matching. By emerging large-scale ontologies in real domain, the ontology matching systems faced with some problem like memory con...

متن کامل

designing and implementing a 3D indoor navigation web application

​During the recent years, the need arises for indoor navigation systems for guidance of a client in natural hazards and fire, due to the fact that human settlements have been complicating. This research paper aims to design and implement a visual indoor navigation web application. The designed system processes CityGML data model automatically and then, extracts semantic, topologic and geometric...

متن کامل

AHP Techniques for Trust Evaluation in Semantic Web

The increasing reliance on information gathered from the web and other internet technologies raise the issue of trust. Through the development of semantic Web, One major difficulty is that, by its very nature, the semantic web is a large, uncensored system to which anyone may contribute. This raises the question of how much credence to give each resource. Each user knows the trustworthiness of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009